NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Adaptive and robust multi-task learning

https://doi.org/10.1214/23-AOS2319

Duan, Yaqi; Wang, Kaizheng (October 2023, The Annals of Statistics)

We study the multitask learning problem that aims to simultaneously analyze multiple data sets collected from different sources and learn one model for each of them. We propose a family of adaptive methods that automatically utilize possible similarities among those tasks while carefully handling their differences. We derive sharp statistical guarantees for the methods and prove their robustness against outlier tasks. Numerical experiments on synthetic and real data sets demonstrate the efficacy of our new methods.
more » « less
Full Text Available
Learning Good State and Action Representations for Markov Decision Process via Tensor Decomposition

Ni, Chengzhuo; Duan, Yaqi; Dahleh, Munther; Wang, Mengdi; Zhang, Anru R. (February 2023, Journal of machine learning research)

Full Text Available
Near-optimal Offline Reinforcement Learning with Linear Representation: Leveraging Variance Information with Pessimism

Yin, Ming; Duan, Yaqi; Wang, Mengdi; Wang, Yu-Xiang (April 2022, International Conference on Learning Representation)

Offline reinforcement learning, which seeks to utilize offline/historical data to optimize sequential decision-making strategies, has gained surging prominence in recent studies. Due to the advantage that appropriate function approximators can help mitigate the sample complexity burden in modern reinforcement learning problems, existing endeavors usually enforce powerful function representation models (e.g. neural networks) to learn the optimal policies. However, a precise understanding of the statistical limits with function representations, remains elusive, even when such a representation is linear. Towards this goal, we study the statistical limits of offline reinforcement learning with linear model representations. To derive the tight offline learning bound, we design the variance-aware pessimistic value iteration (VAPVI), which adopts the conditional variance information of the value function for time-inhomogeneous episodic linear Markov decision processes (MDPs). VAPVI leverages estimated variances of the value functions to reweight the Bellman residuals in the least-square pessimistic value iteration and provides improved offline learning bounds over the best-known existing results (whereas the Bellman residuals are equally weighted by design). More importantly, our learning bounds are expressed in terms of system quantities, which provide natural instance-dependent characterizations that previous results are short of. We hope our results draw a clearer picture of what offline learning should look like when linear representations are provided.
more » « less
Full Text Available
Learning Good State and Action Representations via Tensor Decomposition

https://doi.org/10.1109/ISIT45174.2021.9518158

Ni, Chengzhuo; Zhang, Anru R.; Duan, Yaqi; Wang, Mengdi (July 2021, 2021 IEEE International Symposium on Information Theory (ISIT))

Full Text Available
State aggregation learning from Markov transition data

Duan, Yaqi; Ke, Zheng Tracy; Wang, Mengdi (October 2019, Advances in neural information processing systems)
null (Ed.)
State aggregation is a popular model reduction method rooted in optimal control. It reduces the complexity of engineering systems by mapping the system’s states into a small number of meta-states. The choice of aggregation map often depends on the data analysts’ knowledge and is largely ad hoc. In this paper, we propose a tractable algorithm that estimates the probabilistic aggregation map from the system’s trajectory. We adopt a soft-aggregation model, where each meta-state has a signature raw state, called an anchor state. This model includes several common state aggregation models as special cases. Our proposed method is a simple two- step algorithm: The first step is spectral decomposition of empirical transition matrix, and the second step conducts a linear transformation of singular vectors to find their approximate convex hull. It outputs the aggregation distributions and disaggregation distributions for each meta-state in explicit forms, which are not obtainable by classical spectral methods. On the theoretical side, we prove sharp error bounds for estimating the aggregation and disaggregation distributions and for identifying anchor states. The analysis relies on a new entry-wise deviation bound for singular vectors of the empirical transition matrix of a Markov process, which is of independent interest and cannot be deduced from existing literature. The application of our method to Manhattan traffic data successfully generates a data-driven state aggregation map with nice interpretations.
more » « less
Full Text Available

Search for: All records